System and method for determining an event occurrence rate

ABSTRACT

Described are a system and method for determined an event occurrence rate. A sample set of content items may be obtained. Each of the content items may be associated with at least one region in a hierarchical data structure. A first impression volume may be determined for the at least one region as a function of a number of impressions registered for the content items associated with the at least one region. A scale factor may be applied to the first impression volume to generate a second impression volume. The scale factor may be selected so that the second impression volume is within a predefined range of a third impression volume. A click-through-rate (CTR) may be estimated as a function of the second impression volume and a number of clicks on the content item.

CLAIM OF PRIORITY

This application is a continuation of and claims priority to U.S. Ser.No. 11/696,944, filed Apr. 5, 2007, which is hereby incorporated byreference in its entirety.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent files or records, but otherwise reserves all copyrightrights whatsoever.

CROSS REFERENCE TO RELATED APPLICATION

The present application is related to co-pending U.S. patent applicationSer. No. 11/637,524, entitled “SYSTEM AND METHOD FOR MATCHING OBJECTSBELONGING TO HIERARCHIES,” filed on Dec. 12, 2006, and published underUS Publication No. 2008/0140591, the disclosure of which is herebyincorporated by reference in its entirety.

FIELD OF THE INVENTION

The invention disclosed herein relates generally to determining an eventoccurrence rate. More specifically, the present invention relates toestimating an occurrence rate for events aggregated at multipleresolutions through hierarchical data structures.

BACKGROUND OF THE INVENTION

Web advertising is typically implemented according to two generalschemes: content match and sponsored search. Content match refers toplacement of advertisements (“ads”) within a webpage on the basis of thecontent of the web page. Sponsored search refers to placing ads on asearch results page generated by a web search engine, the ads beingresponsive to a query that a given user submits to the web searchengine. The ads placed on the search results page are selected viaanalysis of a query string entered into the web search engine. Those ofskill in the art recognize that other factors or parameters beyond thequery string may influence the selection of ads for placement on asearch results page that the web search engine generates including ascore that indicates the quality of the ad, a time zone of the user,user browsing history, demographic information, etc. A content matchsystem can generate data indicating each instance that an ad isdisplayed on a webpage (an “impression”).

An ad network, an intermediary entity that selects the ad in the contentmatch system, determines a most relevant ad to place on the webpage toentice a user to click on that ad. For example, on a webpage related tosports, the ad network may select ads for soft drinks, because ademographic of visitors interested in sports may be substantiallysimilar to a demographic likely to buy soft drinks. By computing a ratioof a number of clicks on the ads to a number of impressions, the adnetwork can determine a click-through-rate (CTR) indicative of, interalia, the relevancy of the ads that are selected. Thus, the CTR becomesa valuable indicator for ad networks seeking to attract business fromadvertisers. However, the number of clicks is typically very lowcompared to the number of impressions. Conventional estimationalgorithms based on frequencies of event occurrences incur highstatistical variance and fail to provide satisfactory predictions of theCTR because the number of clicks appears negligible in view of the largeamount of impressions. Furthermore, estimating CTR from entire corpus ofdata might involve storing information for each impression. In a contentmatching system, however, this might involve crawling pages and storingthe entire page content, which is expensive both in terms of storage andbandwidth requirements.

Therefore, there exists a need for a reliable sampling model fordetermining an occurrence of a rare event within large volumes of data.

SUMMARY OF THE INVENTION

The present invention generally relates to systems and methods fordetermining an event occurrence rate. A sample set of content items maybe obtained. Each of the content items may be associated with at leastone region in a hierarchical data structure. According to oneembodiment, a hierarchical data structure comprises nodes in anadvertisement taxonomy hierarchy and nodes in a page taxonomy hierarchy,with a given region characterized or otherwise identified by acombination of nodes from the advertisement taxonomy hierarchy and nodesfrom the page taxonomy hierarchy. A first impression volume may bedetermined for the at least one region as a function of a number ofimpressions registered for the content items associated with the atleast one region. A scale factor may be applied to the first impressionvolume to generate a second impression volume. The scale factor may beselected so that the second impression volume is within a predefinedrange of a third impression volume. A click-through-rate (CTR) may beestimated as a function of the second impression volume and a number ofclicks on the content item.

The content items may include at least one of webpages and ads. Theobtaining of the sample set may include identifying first content itemsthat have been clicked, identifying a predetermined number of secondcontent items that have not been clicked, and generating the sample setas a function of the first and second content items. The firstimpression volume may be calculated as a function of the impressions forthe first and second content items. The third impression volume may be atotal number of impressions associated within a pre-selected level inthe hierarchical data structure. A difference impression volume may becalculated as a difference between the first impression volume and thethird impression volume, and the difference impression volume may bedistributed to the at least one region as a function of the firstimpression volume. The distributing may include determining a sum of thefirst impression volumes for each region across a level of thehierarchical data structure, computing a ratio of the first impressionvolume for a given region to the sum, multiplying the differenceimpression volume by the ratio to determine an impression addition forthe given region, and adding the impression addition to the firstimpression volume of the given region to generate a fourth impressionvolume. Estimating the CTR may include assigning a state variable toeach of the at least one region, and applying a Markovian model to thestate variable to estimate the CTR. The Markovian model may compute aposterior for the state using a Kalman filter, propagate the posteriorto the at least one region, and repeat the computing and the propagatinguntil convergence. Upon the convergence, the CTR for the at least oneregion may be identified and stored on a storage medium.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is illustrated in the figures of the accompanying drawingswhich are meant to be exemplary and not limiting, in which likereferences are intended to refer to like or corresponding parts, and inwhich:

FIG. 1 shows an exemplary embodiment of a system for determining anevent occurrence rate according to one embodiment of the presentinvention;

FIG. 2 shows an exemplary embodiment of a method for determining anevent occurrence rate according to one embodiment of the presentinvention;

FIG. 3 shows an exemplary embodiment of a method for generating a sampleset of webpages/ads according to one embodiment of the presentinvention;

FIG. 4 shows an exemplary embodiment of a method for determiningimpression volumes at a predetermined node(s) in a webpage/ad hierarchyaccording to one embodiment of the present invention;

FIG. 5 shows an exemplary embodiment of a method for estimating aclick-through-rate in one or more regions of a webpage/ad hierarchyaccording to one embodiment of the present invention; and

FIG. 6 shows an exemplary embodiment of a generative model for a twolevel hierarchy according to one embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

In the following description of the preferred embodiments, reference ismade to the accompanying drawings that form a part hereof, and in whichis shown by way of illustration a specific embodiment in which theinvention may be practiced. It is to be understood that otherembodiments may be utilized and structural changes may be made withoutdeparting from the scope of the present invention.

FIG. 1 shows an exemplary embodiment of a system 100 for determining anevent occurrence rate according to the present invention. The system 100may comprise a publisher server 102, an ad network server 104 and aclient device 106 which are communicatively interconnected via acommunications network 108 (e.g., a wired/wireless LAN/WAN, a cellularnetwork, the Internet, an intranet, a VPN, a PSTN, etc.). The publisherand ad network servers 102, 104 and the client device 106 may beprocessor-based computing devices which include memory and networkconnection ports for communicating data on the network 108. For example,the client device 106 may be a PC, laptop, mobile phone, PDA, tabletcomputer, handheld computer, smart appliance (e.g., scanner, copier,facsimile machine), etc. which utilizes a web browser (or command-lineinterface) for allowing a user to interface with devices on the network108 and view content items (e.g., webpages, ads, videos, audio files,etc.). Those of skill in the art understand that any number of clientdevices 106 may be connected to the network 108 and that the servers 102may comprise any number of servers and/or databases.

The publisher server 102 may host one or more webpages that includetext, audio, video and/or interactive content (e.g., games, Flashprograms, etc.). The webpages may also include ad space (e.g., blankspace on the webpage in which an ad may be displayed). A companyoperating the publisher server 102 may generate revenue by displayingthe ads on the webpages. The ads may be hosted by the ad network server104 or an ad company server 110 (e.g., a repository withcompany/product-specific ads). When the browser on the client device 106requests the webpage from the publisher server 102, the ad networkserver 104 selects an ad (usually based on an agreement with the websiteowner and the advertiser) from its own database (or retrieves theselected ad from the ad company server 110) and transmits the selectedad to the client device 106. Displaying the ad on the webpage istypically referred to as an “impression.” The user then sees theselected ad as a part of the webpage that was requested.

Along with using rules defined in website owner-advertiser agreements toselect ads, the ad network server 104 may also implement a content matchapplication. The content match application may include a crawler modulewhich indexes content on various webpages and ads available to be servedby the ad network server 104. Using the indices, the ad network server104 may select an ad that is most likely to be clicked by the user. Thead network server 104 may generate data recording the impressions andthe clicks on served ads for calculating a click-through-rate (CTR),e.g., a percentage of ads that were served and clicked. The CTR may be avaluable statistic for the ad network to demonstrate to advertisers theefficacy of the content match application.

In an exemplary embodiment of the present invention, the CTR may beestimated at one or more resolutions of webpage/ad hierarchy. That is,the webpages and ads may be classified (manually or automatically) intoa pre-existing hierarchy in which nodes in the hierarchy are associatedwith contextual themes (e.g., skiing winter sports 4 sports). The webpages/ads may be associated with a give node based on the resolutionthereof. That is, the more themes used to describe a webpage/ad, thefurther to the fringe the webpage/ad will be in the hierarchy.

While the exemplary embodiments will be described with reference to asingle hierarchy used by both the webpages and the ads, those of skillin the art will understand that the webpages and the ads may utilizemutually exclusive and/or overlapping hierarchies. The hierarchy may bea tree comprising a single root node that extends into a plurality ofleaf nodes. One or more of the leaf nodes may be identified ascomprising a region of the tree. For example, a parent node and itschildren nodes, a plurality of nodes with a common ancestor node orsharing a common theme may be considered a region, or a region may beidentified by the contextual theme (e.g., swimming→summersports→sports).

FIG. 2 shows an exemplary embodiment of a method 200 for determining anevent occurrence rate according to the present invention. The method 200provides an overview of the exemplary steps for determining an eventoccurrence rate according to the present invention, and, as such,implementation of each of the steps will be described in further detailbelow. In step 202, a sample set of webpages is identified. The sampleset may include a predetermined number of webpages on which ads havebeen served by the ad network server 104, including the webpages/adsthat have been clicked. The webpages in the sample set may be gleanedfrom, for example, a log maintained by the ad network server 104. Instep 204, impression volumes are determined, based on true impressiondata for the webpages in the sample set, for regions in the hierarchy.In step 206, the determined impression volumes and actual numbers ofclicks in a given region are used to determine the CTR(s) for theregion. The CTR computed may be computed for any resolution within thehierarchy, allowing for discrimination between regions which truly havenegligible CTR from those which may obtain more clicks if provided withmore impressions, as will be explained further below.

FIG. 3 shows an exemplary embodiment of a method 300 for generating thesample set of webpages/ads. In step 302, a webpage is identified. Thewebpage may be one of the webpages that was indicated on the log at thead network server 104 as registering as impression. In step 304, it isdetermined whether the webpage was clicked. If the webpage was notclicked, it is determined whether the webpage should be included in thesample set as a non-clicked webpage (step 306). A number of thenon-clicked webpages included in the sample set may be predetermined ordetermined automatically when harvesting the pages (e.g., as a functionof a total number of webpages in the log, a total number of webpagesclicked, a total number of impressions, etc.).

In step 308, the page is crawled to obtain features thereof forclassification into a region of the hierarchy. The features on a webpageinclude, but are not limited to, a URL, an HTML tag(s), words, images,scripts, etc. As understood by those of skill in the art, features onthe ads may be available from the log or other pre-recorded dataidentifying (or providing data for identifying) the features.

In step 310, the impressions associated with the webpage are mapped ontoregions in the hierarchy corresponding to the features of the webpage.This yields the number of sampled impressions in each of the regions.The method 300 may be iterated over all of the webpages in the sampleset, resulting in a hierarchy which reflects all of the sampledimpressions in each of the regions. Because the impressions associatedwith the sample set of webpages are relatively small (as compared to thetotal number of impressions recorded), the hierarchy may not fullyreflect true impression volumes for all of the regions.

FIG. 4 shows an exemplary embodiment of a method 400 for determiningimpression volumes at a predetermined node(s) in the hierarchy using thesample impression at another region(s). In step 402, a scale factor isdetermined, and, in step 404, the scale factor is applied to all nodesacross a level of the hierarchy. In step 406, it is determined whetherthe total number of true impressions indicated in the log issubstantially equal to (e.g., within a predefined error bound) thesampled impressions for the predetermined node multiplied by the scalefactor. If the totals are incongruous, the scale factor may be modified.

In step 408, a lower bound on impression volume is computed for each ofthe regions. The lower bound may be, for example, the total number ofsampled impressions in each of the respective regions. In step 410,excess impressions (e.g., the total number of scaled impressions in aregion minus the lower bound of sampled impressions in the region), maybe distributed among the respective regions. That is, by conformingestimated impression volumes to the scaled impression totals at eachnode in the page and ad hierarchies, a variance of the estimatedimpression volumes may be reduced. Additionally, a sum of the estimatedimpression volumes for children regions nested within a parent regionshould correlate to the estimated impression volume of the parentregion. As will be explained further below, the excess impressions maybe imputed to some (or all of) the nodes using a maximum entropyformulation.

FIG. 5 shows an exemplary embodiment of a method 500 for estimating theCTR in one or more regions of the hierarchy using the estimatedimpression volumes. In step 502, a state variable is assigned to eachnode in the hierarchy. In step 504, a posterior of each of the statevariables is computed for each node from fringe leaf nodes to the rootnode. The posterior may be computed by, for example, a Kalman filteralgorithm that “filters” the leave nodes in a bottom-up fashion to theroot node.

In step 506, a smoothing effect may be applied to modify the statevariables. The smoothing effect may be the result of applying aMarkovian model on the state variables. That is, since the statevariables of child nodes sharing a common parent node are drawn from adistribution centered around the state variable of the parent, theMarkovian model may specify a joint distribution on an entire statespace of CTR values.

In step 508, variance components of the Markovian model may be estimatedusing, for example, an Expectation-Maximization (EM) algorithm. The EMalgorithm may repeat steps 504 (filtering) and 506 (smoothing) forseveral iterations until convergence (step 510). When convergence isreached, the resulting CTR values may be stored on a storage device foroutput and/or additional processing. In step 512, the resulting CTRvalues may be stored on a storage medium.

A more detailed exemplary embodiment of determining and imputingimpression volumes is described below. A set of regions Z may consist oftwo successive levels of nested regions corresponding to depths 1 and 2,respectively. Generalization to all regions formed by the page and adhierarchies may follow as: let IJ and ij denote regions in Z⁽¹⁾ andZ⁽²⁾, respectively. The actual impressions in region r from the clickedand non-clicked pages (e.g., as described with reference to the method300) may be denoted as n_(r) and m_(r), respectively. Thus,lb_(r)=n_(r)+m_(r) may provide a lower bound on the impression volumefor the region r. Let N_(r)denote the true impression volume in region rthat is to be estimated may be denoted as N_(r). Using a lineartransformation x_(r)=N_(r)lb_(r), the estimation problem may be writtenin terms of x_(r) and derive estimates of N_(r) as N_(r)=x_(r)+lb_(r)where x_(r)is our estimate of x_(r). In fact, the x_(r)'s may beinterpreted as excess impressions that may be allocated to adjust for asampling bias.

A page (or ad) classified to a node i in the tree may belong to theentire path from a node i to the root node. Also, the page (or ad) maybe classified to a node at a depth other than leaf node L—leaf level. Asunderstood by those of skill in the art, this classification scheme hasthe potential to create inconsistencies in a total number of impressionsand clicks obtained at different levels in the tree. For instance, thetotal number of impressions (or clicks) for a group of children regionsmay be strictly smaller than the number of impressions (or clicks) ofthe parent region they are nested within. To ensure consistency, theexcess impressions and clicks in a parent node are distributed among thechildren nodes associated therewith. The steps are repeated at everylevel in a top-down fashion. Thus, each impression in a non-leaf regionis guaranteed to come from some smaller region nested within it.

One or more constraints may be imposed while imputing the impressionvolumes as described in the method 400. A first set of constraints(e.g., column constraints) may ensure that a sum of the impressionsalong a column is substantially equal to a total number of impressionsfor a corresponding node in the ad hierarchy:

Σx _(ij) =a _(j) Σlb _(ij) =CS _(j) ⁽²⁾; for all j in Level 2  (1)

Σx _(IJ) =a _(J) Σlb _(IJ) =CS _(J) ⁽¹⁾; for all J in Level 1  (2)

In the exemplary column constraint, a_(j)(a_(J)) is the total impressionvolume for node j(J) in the ad hierarchy, and CS.^((.))) represents theexcess impressions in the column that were missed by the samplingprocess. For a node J at level 1 in the ad hierarchy,a_(j)=Σ_(jpa(j))=J^(a)j, where pa(j) denotes the parent node of node j,e.g., the column impressions total for a level 1 node is the sum of thecolumn totals of its children in level 2. Also, ΣCS_(j) ⁽²⁾=ΣCS_(J)⁽¹⁾)=TotExcess, where TotExcess is the total number of excessimpressions in the data.

A second set of constraints (e.g., row constraints) may preserve theimpression volumes at nodes in the page hierarchy as follows:

Σx_(ij)=K⁽²⁾Σm_(ij)=RS_(i) ⁽²⁾; Vi

Σx_(IJ)=K⁽¹⁾Σm_(IJ)=RS_(I) ⁽¹⁾;VI  (3)

In the second set of constraints, RS.^((.)) represents the excessimpressions aggregated for each node in the page hierarchy, and K⁽¹⁾ andK⁽²⁾ are constants for levels 1 and 2. The underlying assumption is thatfor each sampled impression, there are K^((.)) times as many excessimpressions from the non-clicked pool that did not appear in the sample.Since pages may be randomly sampled from the non-clicked pool, thissimple adjustment is reasonable. The constants K^((.)) are chosen topreserve total impression volume, e.g., so that ΣRS_(i) ⁽²⁾=ΣRS_(I)⁽¹⁾=TotExcess.

A third set of constraints (e.g., block constraints) may ensure that theexcess impressions allocated to a region at level 1 equals the sum ofexcess impression allocated to regions nested within it at level 2 asfollows:

Σ_(i:j:pa(ij)=IJ)=x_(IJ); for all IJ  (4)

As understood by those of skill in the art, true impression volumes maysatisfy the block contracts. Thus, the block constraints may be imposedduring the imputation of impression volumes. Additionally, analogousrow, column and block constraints may be imposed at all other levelsl(l=0, . . . , L).

In estimating the impression volumes, a set of positive initial priorvalues {x_(r)(0)} may be identified for all regions r E Z. An aim of theexemplary embodiments of the present invention is to determine asolution {x_(r)} which is as close as possible to the prior initialvalue {x_(r)(0)} but satisfies all the row, column and blockconstraints. As understood by those of skill in the art, this processmay be equivalent to finding a solution having a smallest discrepancyfrom the prior distribution in terms of Kullback-Leibler divergence,subject to the constraints. It may also be referred to as a MaximumEntropy model, because, when the prior initial value {x_(r)(0)} isuniform, the solution may maximize Shannon entropy.

In one exemplary embodiment, the Maximum Entropy model may be solvedusing an Iterative Proportional Fitting (IPF) algorithm, which iteratescyclically over all of the constraints and updates the x_(r) values tomatch the constraints as closely as possible. Specifically, at thet^(th) iteration, if: a constraint of the form Σ_(r)k_(r)x_(r)=C isbeing violated (k_(r)=0 or 1 for all of the constraints); the currentvalue C(t) of the LHS is C(t)=Σ_(r)k_(r)x_(r)(t), where C_(t)≠C; then,the IPF algorithm adjusts each element x_(r) involved in the constraintby a constant factor C/C(t) to get the new valuesx_(r)(t+1)=x_(r)(t)·C/C(t). Updating in this manner may ensurenon-negativity of a final solution. The updates may be performed for allconstraints until convergence.

The exemplary embodiment of the present invention may jointly estimateall x_(r)'s by iterating through a series of top-down and bottom-upscalings. For a two level tree, at the t^(th) iteration, start withlevel 1, and modify {x_(IJ)(t)} to {x_(IJ)(t+1)} after adjusting for therow and column constraints. This changes the values of {x_(ij)(t)}'s atlevel 2 to {x*_(ij)(t)}'s by adjusting for the corresponding blockconstraints. At level 2, change the {x*_(ij)(t)}'s to {x_(ij)(t+1)}'s byadjusting for row and column constraints. This completes the top-downstep. In the bottom-up step, the leaf regions (in the exemplaryembodiment, the regions at level 2 do not change, e.g.,x_(ij)(t+2)=x_(ij) (t+1). Using the block constraints, the values atlevel 1 change to {x*_(ij)(t+1)}=Σ_(i:l:pa(ij)=IJ)X_(ij)(t+2) followedby row and column scalings to satisfy the level 1 constraints, endingwith x_(IJ)(t+2). The top-down and bottom-up steps may be iterated untilconvergence. The algorithm may converge rapidly, requiring, for example,156 iterations for an error tolerance of 1%.

The exemplary algorithm described above with reference to a two-leveltree may be extended to a tree with/levels as follows:

Initialization: Begin with a prior {x_(r)(0)}for regions r E Z⁽¹⁾ oflevel 1 From iteration t to t + 2: Begin Top-down: A For all r E Z⁽¹⁾,x_(r)(t) → row constraints → column constraints → x_(r)(t + 1) Forlevels 1 = 2, ... , L For all r E .Z⁽¹⁾: x_(r)(t) → block constraintswith x_(pa(r)) (t + 1) on the RHS → x*_(r)(t), where pa(r) is the parentregion subsuming r x*_(r)(t) → row constraints → column constraints →x_(r)(t + 1) Begin bottom-up: For all r E Z^((L)), x_(r)(t + 2) =x_(r)(t + 1) For levels 1 = L,..., 1 For all r E Z^((e)): x;.(t + 1) =Σ_(kEch(r)) x_(k)(t + 1), where ch(r) are all children regions nestedwithin r  x*_(r)(t + 1) → row constraints → column constraints →x_(r)(t + 2)  Iterate until all constraints are substantially satisfiedup to a predefined  accuracy factor

One exemplary variable in the exemplary imputation algorithm is thechoice of the prior. Setting x_(r)(0) is proportional to lb_(r) mayensure that the excess impressions are distributed in proportion to thelower bounds obtained from the crawled sample as closely as possiblesubject to the constraints. An alternative is to simply use thetraditional IPF algorithm, which starts with a prior of x_(r)(0) that isproportional to 1, and computes the x_(r)values for each levelseparately, using only the row and column constraints. It can be shownthat this automatically satisfies the block constraints as well, due tothe relationships between the row and column sums at different levels.However, the prior distributes the excess impressions using anindependence model and does not incorporate the a priori interactioninformation in the lower bounds.

After the impression volumes have been imputed to the hierarchy, theCTRs are estimated for all (or selected ones) of the nodes therein. Thedistribution of raw CTRs may be skewed and the variance may depend onthe mean (roughly, Var proportional to mean/N_(r)). In the exemplaryembodiment, the count data may be modeled on a transformed scale usingthe Freeman-Tukey transformation:

$\begin{matrix}{{y_{r} = {\frac{1}{2}\left( {\sqrt{\frac{c_{r}}{{\hat{N}}_{r}}} + \sqrt{\frac{c_{r} + 1}{{\hat{N}}_{r}}}} \right)}},} & (5)\end{matrix}$

In the above transformation, c_(r) is the number of clicks in the regionr and N_(r) is the imputed number of impressions, determined from theimputation algorithm described above. The second term in thetransformation distinguishes between zeros on the basis of the number ofimpressions, e.g., zero clicks from 100 impressions corresponds to asmaller transformed CTR than zero clicks from only 10 impressions. Thetransformation may also provide symmetry to an otherwise skewed ratedistribution and provide a variance stabilization property, making thevariance of the distribution independent of the mean (roughly, Varproportional to 1/N_(r)). In an alternative exemplary embodiment, asquared-root transform may be utilized to model the data on atransformed scale.

As stated above in the description of method 500, the Markov model maybe used as a generative model to calculate the CTRs from the imputedimpression volumes. In the exemplary dataset, u^(T) _(r)=1 for all rwhich corresponds to one covariate for each level in the regionhierarchy. Conditional on the states {S_(r)} being known, assume theobservations y_(r) to be independently distributed as a Gaussian:

y_(r)|S_(r),β^((d(r)))˜N(u_(r™β) ^((d(r)))+S_(r), V_(r)),  (6)

The β^((d(r))) is the unknown coefficient vector attached to covariatesat level d(r), and V_(r) is the unknown variance parameter. The latentS_(r) variables are adjusting for effects that are not accounted for bythe covariates. However, estimating one S_(r) per region leads to severeoverfitting; hence smoothing on S_(r)'s is necessary. The smoothing stepis performed by exploiting dependencies induced by the tree structure ofregions:

S _(r) =S _(pa(r)) +w _(r),  (7)

The w_(r) is computationally similar to N(0, W_(r)) for all r E Z \Z⁽⁰⁾.Also, w_(r) is independent of S_(pa(r)) and S_(Root)=W_(Raot)=0. FIG. 6shows an exemplary embodiment of the generative model for two levels.

In the exemplary embodiment, estimating a separate W_(r) and V_(r) foreach region may require assuming that all regions at the same level havethe same W_(r) value: W_(r)=W^((l)) for all r E S^((l)). Modelingassumptions on V_(r) depend on the data and the tree structure ofregions. In the present example, Var(y_(r)) is proportional to 1/N,(from Equation 5). Thus, assume that there is a V such thatV_(r)=V/N_(r) for all r E S^((l)).

The ratios W_(r)/V_(r), determine the amount of smoothing that takesplace in the Markovian model. If W_(r) is large relative to V_(r), thesibling S_(r)'s are drawn from a distribution that has high variance andhence little smoothing. According to one embodiment, if W_(r)/V_(r) isproportional to infinity, then S_(r) 43 (y_(r) u^(T) _(r)β^((d(r)))) andthe training data is perfectly fit. On the other extreme, ifW_(r)/V_(r)→0, then S_(r)→0 and the fit is a regression model given bythe covariates, with the maximum possible smoothing.

From the above description, one or more correlations may be implied bythe Markovian model. For example, from Equation 7 and the independenceof w_(r) and S_(pa(r)), it follows that:

$\begin{matrix}{{{Var}\left( S_{r} \right)} = {\sum\limits_{t = 1}^{d{(r)}}\; W^{(i)}}} & (8)\end{matrix}$

Thus, the variance in the states S_(r) depends only on the depth ofregion r, and increases when moving from coarser to finer resolutions.

For any two regions r1 and r2 at depth/sharing a common ancestor g atdepth l′<1, the covariance between the state values is given byCov(S_(r1), S_(r2))=Var(S_(q)), which depends only on l′. Thus, thecorrelation coefficient of nodes at level 1 whose least common ancestoris at level l′ is given by

$\begin{matrix}{{{Corr}\left( {l,l^{\prime}} \right)} = \frac{\sum\limits_{i = i}^{l^{\prime}}\; W^{(i)}}{\sum\limits_{i = 1}^{l}\; W^{(i)}}} & (9)\end{matrix}$

The correlation coefficient Corr (l,l′) depends only on the level of theregions and the distance to their least common ancestor. The y_(r)'s maybe independent conditional on S_(r)'s, but the dependencies in S_(r)'simpose dependencies in the marginal distribution of y_(r)'s.

As explained and described above, the EM algorithm may be used toestimate the posterior distribution of {S_(r)}'s and {β^((d(r)))}'s andprovide point estimates of the variance components {W(l)} and V.Implementation of the EM algorithm may utilize a Kalman filtering stepfor efficiently estimating the posterior distributions of {S_(r)}'s forfixed values of the variance components. The Kalman filtering algorithmitself consists of two steps, namely, a filtering step that aggregatesinformation from the leaves up to the root, followed by a smoothing stepthat propagates the aggregated information in the root downwards to theleaves. To provide intuition on the filtering step, note that the stateequations may be inverted to express parent states in terms of theirchildren's states:

$\begin{matrix}\begin{matrix}{S_{{pa}{(r)}} = {{E\left( {S_{{pa}{(r)}}S_{r}} \right)} + \left( {S_{{pa}{(r)}}{E\left( {S_{{pa}{(r)}}S_{r}} \right)}} \right)}} \\{= {{B_{r}S_{r}} + \psi_{r}}}\end{matrix} & (10) \\{where} & \; \\{{B_{r} = {\sum\limits_{i = 1}^{{d{(r)}} - 1}\; {W^{(i)}/{\sum\limits_{i = 1}^{d{(r)}}\; W^{(i)}}}}},} & \; \\{{E\left\lbrack \psi_{r} \right\rbrack} = {0\mspace{14mu} {and}}} & \; \\{{V\hat{a}{r\left( \psi_{r} \right)}} = {W^{({d{(r)}})}B_{r}}} & \;\end{matrix}$

Beginning with initial estimates for {W^((l))(0)}, V, and{β^((d(r)))(0)}, the EM algorithm may use these in the Kalman filteringand smoothing steps, recomputing the variance and covariate components,and repeating the process until convergence. At step l+1, the EMalgorithm first computes the expected log-likelihood of the conditionaldistribution of all the state variables {S_(r)} given the currentestimates of all variance and covariate components {W⁽¹⁾(t)}, V(t),{β(t)} and the data {y_(r)}. This step uses the posterior distributionsof the state variables from the Kalman filtering and smoothing steps.Subsequently, the parameters {W⁽¹⁾(t+1)}, V(t+1), {β₁(t+1)} aredetermined which maximize the conditional distribution of {S_(r)}. Thenew estimates are used at the next timestep of the EM algorithm.

The Kalman filtering step may be implemented as follows:

Filtering: Define, for all r ε Z, the following quantities:${e_{r} = {{y_{r} \cdot u_{r}^{T}}{\overset{\_}{\beta}}^{({d{(r)}})}}};{{\overset{\_}{B}}_{r} = \frac{\sum\limits_{i = 1}^{{d{(r)}} - 1}\; W^{(i)}}{\sum\limits_{i = 1}^{d{(r)}}\; W^{(i)}}}$${\sigma_{r} = {\sum\limits_{i = 1}^{d{(r)}}\; W^{(i)}}};{R_{T} = {{{\overset{\_}{B}}_{r}W_{r}} = {{\overset{\_}{B}}_{r}W^{({d{(r)}})}}}}$For the leaf regions r ε Z^((L)), compute: Ŝ_(r|r) = σ_(r)e_(r)/(σ_(r) +V_(r)); Γ_(r|r) = σ_(r)V_(r)/(σ_(r) + V_(r)) For non-leaf nodes r εZ\Z^((L)), let k_(r) denote the number of children regions under r, andlet c_(i)(r) denote the i^(th) such child. Then, compute: Ŝ_(r|c) _(i)_((r)) = B _(c) _(i) _((r))Ŝ_(c) _(i) _((r)|c) _(i) _((r)) Γ_(r|c) _(i)_((r)) = B _(c) _(i) _((r))Γ_(c) _(i) _((r)|c) _(i) _((r)) B _(c) _(i)_((r)) + R_(c) _(i) _((r))${\hat{S}}_{r|r}^{*} = {\Gamma_{r|r}^{*}\left( {\sum\limits_{i = 1}^{k_{r}}\; {\Gamma_{r|{c_{i}{(r)}}}^{- 1}{\hat{S}}_{r|{c_{i}{(r)}}}}} \right)}$$\Gamma_{r|r}^{*} = \left\{ {\sum\limits_{r}^{- 1}\; {+ {\sum\limits_{i = 1}^{k_{r}}\; \left( {\Gamma_{r|{c_{i}{(r)}}}^{- 1} \cdot \sum\limits_{r}^{- 1}}\; \right)}}} \right\}^{- 1}$Ŝ_(r|r) = Γ_(r|r)(V_(r) ⁻¹e_(r) + (Γ_(r|r) ^(*))⁻¹Ŝ_(r|r) ^(*)) Γ_(r|r)= Γ_(r|r) ^(*) · Γ_(r|r) ^(*)(Γ_(r|r) ^(*) + V_(r))⁻¹Γ_(r|r) ^(*)Smoothing: Set the values Ŝ_(r) = Ŝ_(r|r) and Γ_(r) = Γ_(r|r) for all rε Z^((l)). for all other levels r ε Z\Z^((l)), compute: Ŝ_(r) =Ŝ_(r|r) + Γ_(r|r) B _(r)Γ_(pa(r)|r) ⁻¹(Ŝ_(pa(r)) · Ŝ_(pa(r)|r)) Γ_(r) =Γ_(r|r) + Γ_(r|r) B _(r) ²Γ_(pa(r)|r) ⁻¹(Γ_(pa(r)) ·Γ_(pa(r)|r))Γ_(pa(r)|r) ⁻¹Γ_(r|r) Γ_(r|pa(r)) = Γ_(r|r) B_(r)Γ_(pa(r)|r) ⁻¹Γ_(pa(r)) Expectation Maximization: Define thefollowing: e_(r)(t) = y_(r) · u_(r) ^(T){circumflex over(B)}^((d(r)))(t)${Q^{(l)}\left( {t + 1} \right)} = \frac{\sum\limits_{r\; \varepsilon \; Z^{(l)}}\; {\left( {\Gamma_{r} + \left( {{\hat{S}}_{r} \cdot e_{r}^{t}} \right)^{2}} \right){\hat{N}}_{r}}}{\left| Z^{(l)} \right|}$Then, compute:${V\left( {t + 1} \right)} = \frac{\sum\limits_{l}\; \left| Z^{(l)} \middle| {\cdot {Q^{(l)}\left( {t + 1} \right)}} \right.}{\sum\limits_{l}\; \left| Z^{(l)} \right|}$${W^{(l)}\left( {t + 1} \right)} = \frac{\sum\limits_{r\; \varepsilon \; Z^{(l)}}\; \left( {\Gamma_{r} + {\Gamma_{p\; \alpha \; {(r)}}\mspace{11mu} 2\Gamma_{r,{p\; \alpha \; {(r)}}}} + \left( {{\hat{S}}_{r} \cdot {\hat{S}}_{p\; \alpha \; {(r)}}} \right)^{2}} \right)}{\left| Z^{(l)} \right|}$The value of {circumflex over (β)}^((l))(t + 1) at each level l isobtained by performing a weighted least squares at level l with V(t + 1)as estimate of V.

FIGS. 1 through 6 are conceptual illustrations allowing for anexplanation of the present invention. It should be understood thatvarious aspects of the embodiments of the present invention could beimplemented in hardware, firmware, software, or combinations thereof. Insuch embodiments, the various components and/or steps would beimplemented in hardware, firmware, and/or software to perform thefunctions of the present invention. That is, the same piece of hardware,firmware, or module of software could perform one or more of theillustrated blocks (e.g., components or steps).

In software implementations, computer software (e.g., programs or otherinstructions) and/or data is stored on a machine readable medium as partof a computer program product, and is loaded into a computer system orother device or machine via a removable storage drive, hard drive, orcommunications interface. Computer programs (also called computercontrol logic or computer readable program code) are stored in a mainand/or secondary memory, and executed by one or more processors(controllers, or the like) to cause the one or more processors toperform the functions of the invention as described herein. In thisdocument, the terms “machine readable medium,” “computer program medium”and “computer usable medium” are used to generally refer to media suchas a random access memory (RAM); a read only memory (ROM); a removablestorage unit (e.g., a magnetic or optical disc, flash memory device, orthe like); a hard disk; electronic, electromagnetic, optical,acoustical, or other form of propagated signals (e.g., carrier waves,infrared signals, digital signals, etc.); or the like.

Notably, the figures and examples above are not meant to limit the scopeof the present invention to a single embodiment, as other embodimentsare possible by way of interchange of some or all of the described orillustrated elements. Moreover, where certain elements of the presentinvention can be partially or fully implemented using known components,only those portions of such known components that are necessary for anunderstanding of the present invention are described, and detaileddescriptions of other portions of such known components are omitted soas not to obscure the invention. In the present specification, anembodiment showing a singular component should not necessarily belimited to other embodiments including a plurality of the samecomponent, and vice-versa, unless explicitly stated otherwise herein.Moreover, applicants do not intend for any term in the specification orclaims to be ascribed an uncommon or special meaning unless explicitlyset forth as such. Further, the present invention encompasses presentand future known equivalents to the known components referred to hereinby way of illustration.

The foregoing description of the specific embodiments will so fullyreveal the general nature of the invention that others can, by applyingknowledge within the skill of the relevant art(s) (including thecontents of the documents cited and incorporated by reference herein),readily modify and/or adapt for various applications such specificembodiments, without undue experimentation, without departing from thegeneral concept of the present invention. Such adaptations andmodifications are therefore intended to be within the meaning and rangeof equivalents of the disclosed embodiments, based on the teaching andguidance presented herein. It is to be understood that the phraseologyor terminology herein is for the purpose of description and not oflimitation, such that the terminology or phraseology of the presentspecification is to be interpreted by the skilled artisan in light ofthe teachings and guidance presented herein, in combination with theknowledge of one skilled in the relevant art(s).

While various embodiments of the present invention have been describedabove, it should be understood that they have been presented by way ofexample, and not limitation. It would be apparent to one skilled in therelevant art(s) that various changes in form and detail could be madetherein without departing from the spirit and scope of the invention.Thus, the present invention should not be limited by any of theabove-described exemplary embodiments, but should be defined only inaccordance with the following claims and their equivalents.

1. A computer-implemented method, comprising: electronically, obtaining,via a processing device, a sample set of content items, each of thecontent items including a plurality of features associated with at leastone region in a hierarchical data structure, the hierarchical datastructure comprising nodes in an advertisement taxonomy hierarchy andnodes in a page taxonomy hierarchy, with the at least one regionidentified by a combination of nodes from the advertisement taxonomyhierarchy and nodes from the page taxonomy hierarchy, wherein the sampleset is representation of a whole set of content items including featuresassociated with the at least one region; determining a first impressionvolume for each of the features corresponding to at least one region asa function of a number of impressions registered for a given contentitem from the sample set of content items; applying a scale factor tothe first impression volume to generate a second impression volume, thescale factor being selected so that the second impression volume iswithin a predefined range of a third impression volume; electronically,estimating, via the processing device, a click-through-rate (CTR) as afunction of the second impression volume and a number of clicks on thecontent item.
 2. The method according to claim 1, wherein the contentitems include at least one of webpages and ads.
 3. The method accordingto claim 1, wherein the obtaining includes: identifying first contentitems that have been clicked; identifying a predetermined number ofsecond content items that have not been clicked; and generating thesample set as a function of the first and second content items.
 4. Themethod according to claim 3, further comprising: calculating the firstimpression volume as a function of the impressions for the first andsecond content items.
 5. The method according to claim 1, wherein thethird impression volume is a total number of impressions associatedwithin a preselected level in the hierarchical data structure.
 6. Themethod according to claim 1, wherein the estimating includes: assigninga state variable to each of the at least one region; and applying aMarkovian model to the state variable to estimate the CTR.
 7. The methodaccording to claim 6, wherein the applying includes: computing aposterior for the state variable using a Kalman filter; and propagatingthe posterior to the at least one region; and repeating the computingand the propagating until convergence of the state variable to the CTR.8. The method according to claim 7, further comprising: upon theconvergence, identifying the CTR for the at least one region.
 9. Themethod according to claim 1, further comprising: storing the CTR on astorage medium.
 10. Computer readable media comprising program code thatwhen executed by a programmable processor causes the processor toexecute a method, the method comprising: obtaining a sample set ofcontent items, each of the content items including a plurality offeatures associated with at least one region in a hierarchical datastructure, the hierarchical data structure comprising nodes in anadvertisement taxonomy hierarchy and nodes in a page taxonomy hierarchy,with the at least one region identified by a combination of nodes fromthe advertisement taxonomy hierarchy and nodes from content itemsincluding features associated with the at least one region; determininga first impression volume for each of the features corresponding to atleast one region as a function of a number of impressions registered fora given content item from the sample set of content items; applying ascale factor to the first impression volume to generate a secondimpression volume, the scale factor being selected so that the secondimpression volume is within a predefined range of a third impressionvolume; estimating a click-through-rate (CTR) as a function of thesecond impression volume and a number of clicks on the content item. 11.The computer readable media of claim 10, wherein the content itemsinclude at least one of webpages and ads.
 12. The computer readablemedia of claim 10, wherein the obtaining includes: identifying firstcontent items that have been clicked; identifying a predetermined numberof second content items that have not been clicked; and generating thesample set as a function of the first and second content items.
 13. Thecomputer readable media of claim 12, further comprising: calculating thefirst impression volume as a function of the impressions for the firstand second content items.
 14. The computer readable media of claim 10,wherein the third impression volume is a total number of impressionsassociated within a preselected level in the hierarchical datastructure.
 15. The computer readable media of claim 10, wherein theestimating includes: assigning a state variable to each of the at leastone region; and applying a Markovian model to the state variable toestimate the CTR.
 16. The computer readable media of claim 15, whereinthe applying includes: computing a posterior for the state variableusing a Kalman filter; and propagating the posterior to the at least oneregion; and repeating the computing and the propagating untilconvergence of the state variable to the CTR.
 17. The computer readablemedia of claim 16, further comprising: upon the convergence, identifyingthe CTR for the at least one region.
 18. A system comprising a processorand a memory device storing executable instructions thereon that whenexecuted causes the processor to perform a method comprising: obtaininga sample set of content items, each of the content items including aplurality of features associated with at least one region in ahierarchical data structure, the hierarchical data structure comprisingnodes in an advertisement taxonomy hierarchy and nodes in a pagetaxonomy hierarchy, with the at least one region identified by acombination of nodes from the advertisement taxonomy hierarchy and nodesfrom the page taxonomy hierarchy, wherein the sample set isrepresentation of a whole set of content items including featuresassociated with the at least one region; determining a first impressionvolume for each of the features corresponding to at least one region asa function of a number of impressions registered for a given contentitem from the sample set of content items; applying a scale factor tothe first impression volume to generate a second impression volume, thescale factor being selected so that the second impression volume iswithin a predefined range of a third impression volume; estimating aclick-through-rate (CTR) as a function of the second impression volumeand a number of clicks on the content item.
 19. The system of claim 18,wherein the content items include at least one of webpages and ads. 20.The system of claim 18, wherein the obtaining includes: identifyingfirst content items that have been clicked; identifying a predeterminednumber of second content items that have not been clicked; andgenerating the sample set as a function of the first and second contentitems.
 21. The system of claim 20, further comprising: calculating thefirst impression volume as a function of the impressions for the firstand second content items.
 22. The system of claim 18, wherein the thirdimpression volume is a total number of impressions associated within apreselected level in the hierarchical data structure.
 23. The system ofclaim 18, wherein the estimating includes: assigning a state variable toeach of the at least one region; and applying a Markovian model to thestate variable to estimate the CTR.
 24. The system of claim 23, whereinthe applying includes: computing a posterior for the state variableusing a Kalman filter; and propagating the posterior to the at least oneregion; and repeating the computing and the propagating untilconvergence of the state variable to the CTR.
 25. The system of claim24, further comprising: upon the convergence, identifying the CTR forthe at least one region.